Atty. Dkt. No. A3358Q-US-NP 

XERZ2 01374 

AMENDMENTS TO THE CLAIMS 

The listing of claims will replace all prior versions, and listings of claims in the 
application: 

LISTING OF CLAIMS: 

1 . (Currently amended) A method for computing a measure of similarity between 
a first (or input) document and a ^ne or more disparate second (or search results) 
documents, comprising: 

(a) receiving a first list of rated keywords extracted from the first document and a 
second list of rated keywords extracted from each of the one or more disparate 
documents t hc second document : 

(b) using the first list of rated kevwords and the list of rated kevwords from each of 
the one or more disparate documents socond l ists of ratod keywords t o determine whether 
the first document forms part of the one or more disparate documents oocond document 
using a first computed percentage indicating what percentage of keyword ratings in the first 
list also exist in the Gocond lis t of at least one of the one or more disparate documents : 

(c) computing a second percentage for each of the one or more disparate 
documents indicating what percentage of keyword ratings along with a set of their 
neighboring keyword ratings in the first list also exist in the second list for at least one of 
the one or more disparate documents w hen the first computed percentage indicates that 
the first document is included in the -at least one of the one or more disparate s eeefi^ 
documents; 

(d) using the first computed percentage to specify the measure of similarity when the 
second computed percentage for at least one of the one or more disparate documents is 
greater than the first computed percentagei_andT 

(e) ranking the one or more disparate documents based on the percentage 

computed indicating what percentage of kevword ratines along with a set of their 
neighboring keyword ratings in the first list also exist in the list for at least one of the one or 
more disparate documents. 
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2. (Original) The metliod according to claim 1 , wherein the second percentage 
at (c) is computed by giving weight only to those keywords and their set of neighboring 
keywords in the first list that match in the second list and a threshold percentage of the 
keywords in their set of neighboring keywords. 

3. (Original) The method according to claim 2, wherein the second percentage 
at (c) is computed by giving full weight to those keywords in the first list of rated keywords 
that cannot be accurately identified as having a complete set of neighboring keywords in 
the second set of keywords. 

4. (Original) The method according to claim 2, wherein the threshold percentage 
is reduced when the first list of rated keywords is identified using OCR. 

5. (Currently amended) The method according to claim 1 , further comprising (f) 

(e) if the first computed percentage does not indicate that the first document is included in 
the second document, computing a third percentage using the Jaccard distance measure. 

6. (Currently amended) The method according to claim 5, further comprising (g) 

(f) -if the third computed percentage indicates that the first document is a revision of the 
second document, computing a fourth percentage indicating what percentage of keyword 
ratings along with a set of their neighboring keyword ratings in the second list also exist in 
the first list. 

« 

7. (Original) The method according to claim 6, further comprising using the 
fourth computed percentage to specify the measure of similarity except when: (i) the fourth 
computed percentage is greater than the second computed percentage; (ii) the first list of 
rated keywords is identified using OCR; (iii) the fourth computed percentage is greater than 
fifty percent; and (iv) less than twenty percent of the keywords in the first list of keywords 
are in the second list of keywords. 
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8. (Original) The metliod according to claim 1, wherein the first computed 
percentage indicates that the first document is included in the second document when the 
percentage defined by ratio of Sum1/Sum2 is greater than approximately ninety percent, 
where: D1 is the number of keywords in first list of keywords; 02 is the number of keywords 
in the second list of keywords; Sum1 is the sum of the weights of keywords that appear in 
D1 that also appear in D2; Sum2 is the sum of the weights of keywords in D1. 

9. (Original) The method according to claim 1 , wherein the first list of rated 
keywords includes one or more keywords translated from a second language different from 
a first language that is identified as being a primary language of the first document. 

1 0. (Original) The method according to claim 1 , wherein the first document is a 
portion of the second document. 

1 1 . (Currently amended) A computer-based system for computing a measure of 
similarity between a first (or input) document and one or more a socond (or search results) 
documents, comprising: 

(a) means for receiving a first list of rated keywords extracted from the first 
document and a socond list of rated keywords extracted from each of the one or more 
disparate documents t ho socond documen t , wherein keywords are rated at least in part by 
a relevant weioht from their associated document language : 

(b) means for using the first list of rated kevwords and the list of kevwords from each 
of the one or more disparate documents socond l i sts of rated kovwords t o determine 
whether the first document forms part of the one or more disparate documents s eeen^ 
document using a first computed percentage indicating what percentage of keyword ratings 
in the first list also exist in the second lis t of at least one of the one or more disparate 
documents : 

(c) means for computing a socond percentage for each of the one or more disparate 
documents indicating what percentage of keyword ratings along with a set of their 
neighboring keyword ratings in the first list also exist in the oocond list for at least one of 
the one or more disparate documents w hen the first computed percentage indicates that 
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the first document is included in the -at least one of the one or more disparate s eeen# 
documents; and 

(d) means for using the first computed percentage to specify the measure of 
similarity when the second computed percentage for at least one of the one or more 
disparate documents is greater than the first computed percentage ; andv 

(e) means for ranking the one or more disparate documents based on the 

percentage computed indicating what percentage of keyword ratings alono with a set of 
their neighboring keyword ratings in the first list also exist in the list for at least one of the 
one or more disparate documents. 

1 2. (Original) The system according to claim 1 1 , wherein the second percentage 
at (c) is computed by said computing means by giying weight only to those keywords and 
their set of neighboring keywords in the first list that match in the second list and a 
threshold percentage of the keywords in their set of neighboring keywords. 

13. (Original) The system according to claim 12, wherein the second percentage 
at (c) is computed by said computing means by giying full weight to those keywords in the 
first list of rated keywords that cannot be accurately identified as haying a complete set of 
neighboring keywords in the second set of keywords. 

14. (Original) The system according to claim 12, wherein the threshold 
percentage is reduced when the first list of rated keywords is identified using OCR. 

15. (Currently amended) The system according to claim 1 1 , further comprising 
(f}(e) if the first computed percentage does not indicate that the first document is 

included in the second document, means computes a third percentage using the Jaccard 
distance measure. 
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16. (Original) The system according to claim 15, further comprising (f) if the third 
computed percentage indicates that the first document is a revision of the second 
document, means computes a fourth percentage indicating what percentage of keyword 
ratings along with a set of their neighboring keyword ratings in the second list also exist in 
the first list. 

17. (Original) The system according to claim 16, further comprising means for 
using the fourth computed percentage to specify the measure of similarity except when: (i) 
the fourth computed percentage is greater than the second computed percentage; (ii) the 
first list of rated keywords is identified using OCR; (iii) the fourth computed percentage is 
greater than fifty percent; and (iv) less than twenty percent of the keywords in the first list of 
keywords are in the second list of keywords. 

18. (Original) The system according to claim 11, wherein the first computed 
percentage indicates that the first document is included in the second document when the 
percentage defined by ratio of Sum1/Sum2 is greater than approximately ninety percent, 
where: D1 is the number of keywords in first list of keywords; D2 is the number of keywords 
in the second list of keywords; Sumi is the sum of the weights of keywords that appear in. 
D1 that also appear in D2; Sum2 is the sum of the weights of keywords in D1 . 

19. (Original) The system according to claim 11, wherein the first list of rated 
keywords includes one or more keywords translated from a second language different from 
a first language that is identified as being a primary language of the first document. 
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20 (Currently amended) An article of manufacture for computing a measure of 
similarity between a first (or input) document and a second (or search results) document, 
the article of manufacture comprising computer usable media including computer readable 
instructions embedded therein that causes a computer to perform a method, wherein the 
method comprises: 

(a) receiving a first list of rated keywords extracted from the first document and a 
second list of rated keywords extracted from the second document; 

(b) using the first and second lists of rated keywords to detennine whether the first 
document forms part of the second document using a first computed percentage indicating 
what percentage of keyword ratings in the first list also exist in the second list; 

(c) computing a second percentage indicating what percentage of keyword ratings 
along with a set of their neighboring keyword ratings in the first list also exist in the second 
list when the first computed percentage indicates that the first document is included in the 
second document; 

(d) using the first computed percentage to specify the measure of similarity when the 
second computed percentage is greater than the first computed percentage; 

(e) if the first computed percentage does not indicate that th e first document is 
included in the second document, computing a third percentage using the Jaccard distance 
measure; and 

(f) if the third computed percentage indicates that the first document is a revision of 

the second document, computing a fourth percentage indicating what percentage of 
keyword ratings along with a set of their neighboring keyword ratings in the second list also 
exist in the first list, the fourth computed percentage is used to spec ify the measure of 
similarity except when: (\) the fourth computed percentage is greater than th e second 
computed percentage: (ii) the first list of rated keywords is identified using OCR: (iii) the 
fourth computed percentage is greater than fifty percent: and less than twen ty percent 
of the keywords in the first list of keywords are in the second list of keywords. 
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